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The field of student assessment — from methodology and techniques to the use of 
results — is changing, and these changes are dramatically affecting the work of edu- 
cation employees. 

On one hand, these changes have created new options. For example, classroom 
assessment instruments have expanded to include assessments based on portfolios, 
projects, and performances. Teachers now assess a student’s performance based on 
predetermined criteria more closely aligned with the instructional objectives of the 
lesson and tailor instruction more specifically to individual students. Students 
become partners with the teacher in assessment by having access to these criteria at 
the beginning of the lesson. Classroom assessment is truly becoming an integral 
part of the instructional program as more and more teachers add these assessment 
techniques to their repertoire. 

On the other hand, changes in student assessment have created new concerns, 
especially in the use of assessment results. Today, assessment results are being used 
for more than comparing an individual student’s performance against a state or 
national norm, and for more than providing data for making program improvement 
decisions. They are being used to determine the success or failure of teachers and 
schools. Policy makers and others are using large-scale assessments to decide 
whether teachers and schools are providing an adequate education to all students 
and attaching consequences, positive and negative, on the basis of student assess- 
ment results. The use of student test scores has raised the stakes for all education 
employees. 

Consequently, student assessment is part of every teacher’s work. In fact, near- 
ly one-third of a classroom teacher’s time is spent assessing and evaluating students. 
Many influential groups have identified competence in student assessment as essen- 
tial for the training and licensing of new teachers and the upgrading of the skills of 
practicing teachers (National Board for Professional Teaching Standards, Interstate 
New Teacher Assessment Consortium, National Council for Accreditation of 
Teacher Education, Educational Testing Service, and the National Association of 
State Directors of Teacher Education and Certification). These groups estimate that 
less than one-half of currently practicing teachers have received adequate training 
in student assessment. 

To help members and other educators keep abreast of the ever-changing field of 
student assessment, the National Education Association (NEA) commissioned lead- 
ing assessment experts to write about student assessment from their perspectives. 
Experts Jay McTighe and Steven Ferrara, the authors of this book on classroom- 
based assessment of learning, believe that “the primary purpose of classroom 
assessment is to inform teaching and improve learning, not to sort and select stu- 
dents or to justify a grade.” In this book, a revised edition of an NEA publication 
printed in 1994, they discuss principles of effective classroom assessment, illustrate 
a variety of assessment approaches and methods, and provide a framework for plan- 



ning. Their readable, practical approach to an evolving and sometimes complex sub- 
ject is intended to be of use to teachers at all levels, preschool through graduate 
studies, as well as to other education employees. 

The NEA developed the Student Assessment Series to help teachers and other 
education employees improve their knowledge and skills in student assessment and 
hopes readers will find the series a valuable resource for current and future student 
assessment practices. 

— Glen W. Cutlip 
Series Editor 
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ONGOING ASSESSMENT 
OF STUDENT LEARNING 
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Ongoing assessment of student learning in the classroom is an essential aspect of 
effective teaching. Teachers can use a variety of assessment methods to diagnose 
students’ strengths and needs, plan and adjust instruction, and provide feedback to 
students and parents regarding progress and achievement. The basic premise of this 
book is that the primary purpose of classroom assessment is to inform teaching and 
improve learning, not to sort and select students or to justify a grade. 

The book is intended for teachers from the preschool to graduate school levels to 
use in examining a variety of methods for effectively and fairly assessing their stu- 
dents. While the choice of particular assessment methods will vary according to the 
purpose of the assessment, the con- 
tent of the curriculum, and the age 
levels of students, a set of common 
principles underlies effective class- 
room assessment. This book covers 
these principles, provides the 
strengths and limitations of a vari- 
ety of assessment approaches, pre- 
sents a series of vignettes to illus- 
trate classroom assessment in 
action, and offers a set of guiding 
questions and a framework for 
planning classroom assessments to 
improve teaching and learning. 

Teachers frequently begin new units of study by introducing or reviewing key 
vocabulary with the recognition that an understanding of certain basic concepts will 
enhance subsequent learning of important principles and procedures in the unit. 
Likewise, this book begins with a review of basic terminology commonly associat- 
ed with classroom assessment. (Additional assessment terminology is provided 
throughout the book and definitions of related terms important to the topic of class- 
room assessment are provided in the Glossary.) 
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The primary purpose of 
classroom assessment is 
to inform teaching and 
improve learning, not to 
sort and select students 
or to justify a grade. 




Assessment refers to “any systematic basis for making inferences about charac- 
teristics of people, usually based on various sources of evidence; the global process 
of synthesizing information about individuals in order to understand and describe 
them better” (Brown 1983). While considering this textbook definition, it is inter- 
esting to note that the term assessment is derived from the Latin root assidere mean- 
ing “to sit beside.” Although this original meaning may seem at odds with present- 
day images of large-scale standardized testing — number 2 pencils, “bubble” sheets, 
rigid time limits, silent work, and so on — it conforms more closely with the array 
of assessment methods routinely used by teachers for assessing their students. 
Assidere suggests that, in addition to tests and projects, classroom assessments 
include informal methods of “sitting beside,” observing, and conversing with stu- 
dents as a means of understanding and describing what they know and can do. 

The terms assessment, testing, 
and evaluation are frequently used 
interchangeably, but they have dis- 
tinct meanings. Assessment is a 
broad term referring to the process 
of gathering and synthesizing infor- 
mation to better understand and 
describe characteristics of people. 

Testing is one type of assessment. 

Tests generally utilize a paper-and- 
pencil format, are administered and 
taken within established time limits, restrict test takers’ access to resources (e.g., 
reference materials), and yield a limited range of acceptable responses. Evaluation 
involves making a judgment regarding quality, value, or worth, based on set crite- 
ria. Teacher questioning, reviews of student work folders, and paper-and-pencil 
tests are commonly used assessment methods for gathering information about stu- 
dent learning. Scoring a student essay and assigning report card grades are exam- 
ples of evaluation. 

Another pair of widely used terms, summative assessment and formative assess- 
ment, pertain to the purpose and timing of classroom assessments. Summative 
assessment refers to any culminating assessment that provides a summary report on 
the degree of knowledge or proficiency attained at the conclusion of a unit, course, 
or program of study. A final exam, senior exhibition, or dissertation defense are 
examples of summative assessments. Formative assessment refers to any ongoing 
diagnostic assessment that provides information to help teachers adjust instruction 
and improve student performance. For instance, prior to the start of a unit on the 
Civil War, a teacher might ask students to make a “web” or an outline to show what 
they already know about this period of history as a means of obtaining information 
about students’ prior knowledge. The teacher might also randomly select and inter- 
view several students to check their perceptions and awareness of the Civil War. 
Formative assessment also can be used during instruction to check on student 
understandings and misconceptions. Teachers often use brief written and oral 
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quizzes and classroom discussions to determine if students have learned course 
material and can apply the skills they have been taught. Such activities provide 
teachers with valuable information that allows them to adjust instruction to improve 
student learning. 

Although the term alternative assessment appears widely in the recent education 
literature (Herman, Aschbacher, and Winters 1992), there is no universally agreed- 
upon definition for the term alternative. Generally, alternative assessment is used to 
refer to those assessments that differ from the multiple-choice, timed, “one-shot” 
approaches that characterize most standardized and some classroom assessments. 
The term should be avoided as it is imprecise and open to various interpretations. 




♦ 

LARGE-SCALE VERSUS 
CUSSROOM ASSESSMENT 




Different types of assessments address different information needs. The purpos- 
es and audiences for assessment information influence what is assessed, how it is 
assessed, and how the results are communicated and used. Large-scale assessments 
have very specific purposes. For example, standardized tests, such as the Iowa Tests 
of Basic Skills (ITBS), California Achievement Tests (CAT), and the Stanford 
Achievement Test, are used primarily to satisfy the demands for educational 
accountability. The results of assessments such as these are reported to legislatures, 
boards of education, school administrators, parents, and the general public. 
Standardized tests are generally norm referenced to allow for easy interpretation. 
They are designed to determine how well students have learned particular concepts 
and skills as compared to other students in a norming group. The results of norm- 
referenced assessments may be conveniently displayed so that observers can readi- 
ly distinguish achievement above or below the norm. 

Not all large-scale standardized tests are norm referenced. Some, such as the 
College Board’s Advanced Placement Examinations, the National Assessment of 
Educational Progress (NAEP), and certain state-level competency tests, are criteri- 
on referenced. These tests evaluate and report student performance compared to pre- 
established standards. 

Furthermore, not all standardized tests are multiple-choice in nature. Several 
states currently use standardized performance assessments, featuring open-ended 
tasks, for “high stakes” accountability purposes. 

Standardized tests are considered “high stakes tests” if their results are used for con- 
sequential decisions such as promotion, graduation, admission, certification, evalua- 
tion, or where rewards and sanctions are involved. For example, a districtwide mini- 
mum competency exam would be “high stakes” for students if passing the exam is a 
requirement for a high school diploma. For an extended discussion of standardized test- 
ing, see The Role of High-Stakes Testing in School Reform (Smith 1993). Because they 
are intended to provide accountability information, “one-shot” large-scale standardized 
tests typically do not provide sufficiently detailed or timely information regarding stu- 
dent achievement of specific curriculum goals. 





Classroom assessments serve other purposes and audiences. At the classroom 
level, teachers have different assessment needs — diagnosing student strengths and 
weaknesses, informing students and parents about progress, planning and adjusting 
instruction, and motivating students to focus on valued knowledge and skills. With 
these purposes in mind, classroom assessments may be tailored directly to the cur- 
riculum and to the information needs of individual teachers, students, and parents. 
Unlike “one-shot” standardized tests, assessments designed to promote learning in 
the classroom are more likely to be used over time, include an array of methods, 
focus on elements of quality, offer a more personalized picture of student achieve- 
ment, and provide timely and specific feedback. 








EFFECTIVE CLASSROOM ASSESSMENT 



A large variety of methods is available to teachers for assessing student learning 
(Airasian 1991; Cross and Angelo 1988; Ferrara and McTighe 1992; Stiggins 1994). 
Regardless of the particular methods employed, effective classroom assessment is 
guided by three fundamental principles. Classroom assessment should; (1) inform 
teaching and improve learning; (2) use multiple sources of information; and (3) pro- 
vide valid, reliable, and fair measurements. 

The first principle is based on the premise that the primary purpose of classroom 
assessment is to inform teaching and improve learning (Mitchell and Neill 1992). 
Thus, effective classroom assessment must be an ongoing process instead of a sin- 
gle event at the conclusion of instruction. Rather than waiting until the end of a unit 
of study or course to assess students, effective teachers employ formative assess- 
ments at the beginning of instruc- 
tion to determine students’ prior 
knowledge, and they assess regular- 
ly throughout the unit or course of 
study to obtain information to help 
them adjust their teaching based on 
the learning needs of students. They 
recognize that assessment results 
can inform them about the effective- 
ness of their teaching as well as the 
degree of student learning. 

When using performance-based 
assessments, teachers can make 
their evaluative criteria explicit in advance to serve as a focus for both instruction 
and evaluation. Effective teachers help their students understand that the criteria 
describe the desired elements of quality. They provide regular feedback to students 
based on the identified criteria and allow students to revise their work based upon 
this feedback. They also involve students in peer- and self-evaluation using the cri- 
teria in order to engage students more actively in improving their performance. 



Effective teachers employ 
formative assessments at 
the beginning of Instruc- 
tion ... and they assess 
regularly throughout the 
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Assessment for learning recognizes the mutually supportive relationship between 
instruction and assessment. Like a Mdbius strip where one side appears to seam- 
lessly blend into the other, classroom assessment should reflect and promote good 
instruction. For example, teachers following a process approach to teaching writing 
would allow their students to develop drafts, receive feedback, and make revisions 
as part of the assessment. Likewise, if teachers teach science through a hands-on, 
experimental approach, their assessment should include hands-on investigations. 

The second principle of sound classroom assessment calls for a synthesis of 
information from several sources. The importance of using multiple sources of 
information when assessing learning in the classroom can be illustrated through the 
analogy of taking photographs. A single assessment, such as a written test, is like a 
snapshot of student learning. While a snapshot is informative, it is generally incom- 
plete since it portrays an individual at a single moment in time within a particular 
context. It is inappropriate to use one snapshot of student performance as the sole 
basis for drawing conclusions about how well a student has achieved desired learn- 
ing outcomes. Classroom assessment offers a distinct advantage over a large-scale 
assessment in that it allows teachers to take frequent samplings of student learning 
using an array of methods. To continue the analogy of taking photographs, class- 
room assessment provides an opportunity to construct a "photo album” containing 
a variety of pictures taken at differ- 
ent times with different lenses, 
backgrounds, and compositions. 

The photo album reveals a richer, 
more complete picture of each stu- 
dent than any single snapshot can 
provide. Applying the principle of 
multiple sources is especially 
important when the assessment 
information is used as a basis for 
making critical summative deci- 
sions, such as assigning report card 
grades or determining promotion. 

The third principle of classroom 
assessment concerns validity, relia- 
bility, and fairness. Validity refers 
to the degree to which an assess- 
ment measures what it was intended to measure. For example, to assess students’ 
abilities to conduct research using primary and secondary sources, a media special- 
ist should observe students’ use of these sources directly as they work on their 
research projects. For this learning outcome, a paper-and-pencil test of student 
knowledge of library references would be an indirect and less valid assessment 
since it does not reveal the ability to actually use the references purposefully. 

Reliability refers to the dependability and consistency of assessment results. If 
the same assessment yielded markedly different results with the same students 
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(without intervening variables such as extra instruction or practice time), one would 
question its reliability. Performance assessments present a special challenge since 
they call for judgment-based evaluation of student products and performances. A 
reliable evaluation would result in equivalent ratings by the same rater on different 
occasions. For instance, an observation checklist can be used reliably as long as 
teachers are careful to ensure that their ratings would not differ substantially from 
occasion to occasion (e.g., Monday morning versus Friday afternoon). When teach- 
ers are involved in school- or district-level evaluations based on a set of criteria used 
throughout the school or district, inter-rater reliability must also be considered. In 
this case, scores on a writing assessment would be considered reliable if different 
raters assign similar scores to the same essays. 

Fairness in classroom assessment refers to giving all students an equal chance to 
show what they know and can do. Fairness is compromised when teachers assess 
something that has not been taught or use assessment methods that are incongruent 
with instruction (e.g., asking for recall of facts when the emphasis has been on rea- 
soning and problem solving). The fairness of teacher judgments is also challenged 
by the “halo” and “pitchfork” effects, where expectations based on a student’s past 
attitude, behavior, or previous performance influence the evaluation of his or her 
current performance. 

Subtle, unintended racial, ethnic, 
religious, or gender biases also pre- 
sent roadblocks to the fair assess- 
ment of students. Such biases may 
negatively influence students’ atti- 
tudes toward, and performances on, 
classroom assessments. For exam- 
ple, the junior high mathematics 
teacher who routinely uses sports 
statistics as a main source for prob- 
lem-solving tasks could turn off 
those students who are not sports 
fans. Likewise, insensitivity to 
diverse religious beliefs (e.g., 
choosing reading passages involving only Christian holidays), gender/racial images 
(e.g., depicting all doctors as white males), or socioeconomic status (e.g., assuming 
that all kids have access to a telephone or home computer) may result in unfair eval- 
uation of individuals or groups. Teachers must be on guard so that biases do not 
influence their evaluations of a student’s performance. 

After teachers consider these three general principles, they should address some 
fundamental questions related to planning classroom assessments. 
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Just as teachers have numerous instructional techniques and strategies from 
which to choose, they also have a variety of methods available for assessing learn- 
ing. Teachers can determine which assessment methods to use by responding to sev- 
eral key questions (see Figure 1). 

The first question, under standardsAjenchmarks, concerns content standards, or 
the intended results of the teaching. Teachers should ask: “What do we want stu- 
dents to understand and be able to do?” Content standards typically fall into three 
categories: (1) declarative knowledge — what we want students to understand (facts, 
concepts, principles, generalizations); (2) procedural knowledge — what we want 
students to be able to do (skills, processes, strategies); and (3) attitudes, values, or 
habits of mind — how we would like students to be disposed to act (e.g., appreciate 
the arts, treat people with respect, avoid impulsive behavior). The choice of specif- 
ic assessment methods should be determined in large part by the nature of the con- 
tent standards being assessed (Marzano, Pickering, and McTighe 1993). For exam- 
ple, to assess students’ ability to write an effective persuasive essay, the assessment 
should involve gathering samples of students’ persuasive writing and evaluating 
them against specified criteria. In this case, a multiple-choice test would be ill-suit- 
ed to measure the intended outcome. Likewise, to assess students’ ability to work 
cooperatively on a research project, the assessment should assess group processes 
and products as well as individual performance. 

In addition to considering content standards, teachers need to raise questions 
about the purpose(s) and audience(s) for classroom assessments. They should ask: 
“Why are we assessing and how will the assessment information be used? For whom 
are the assessment results intended?” Purpose and audience influence not only the 
assessment methods selected, but also the ways in which the results of classroom 
assessments are communicated. For example, to provide parents of a primary-grade 
student with an interim report of progress in language arts, the teacher might arrange 
a conference to describe the child’s reading skills in terms of a developmental pro- 
file and review a work folder containing samples of the child’s writing. 
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ASSESSMENT APPROACHES 
AND METHODS 



After teachers identify content standards, purposes, and audiences, they need to 
select assessment approaches and methods (see Figure 2, which provides a system- 
atic guide to selecting assessment methods). Teachers will choose from two for- 
mats — selected-response format and constructed-response format — and the five 
assessment approaches within the two formats. They will select approaches accord- 
ing to whether they want students to: 

1 . select a response 

2. construct a response 

3. create a product 

4. provide an observable performance or 

5. describe their thinking/leaming process 

Each of these formats and approaches has its strengths and limitations, and it is 
instructive to look at classroom examples of how teachers use the various assess- 
ment methods. 



Selected-response formats — which include multiple-choice, true-false, matching, 
and enhanced multiple-choice items — are widely known and used in educational 
testing, especially at the secondary and post-secondary levels (Stiggins and Conklin 
1992). Multiple-choice items are the most common type of selected responses, 
appearing on most commercially produced tests, as well as on many classroom 
assessments. The selected-response format presents students with a question, prob- 
lem, or statement followed by a set of alternative responses. Students make a selec- 
tion from among the given alternatives rather than generate their own response. 
While most selected-response items have a single correct or best response, it is pos- 
sible to create “enhanced multiple-choice” items that have more than one acceptable 
answer contained among the alternatives. 

Selected-response formats have a number of advantages. They allow teachers to effi- 
ciently and objectively assess students’ knowledge of factual information, concepts and 
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Figure 2 
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principles, and the application of basic 
skills. Because assessments using 
these formats can accommodate a 
large number of items, they enable a 
teacher to sample a broad range of 
knowledge and skills in a limited 
amount of time. Because selected- 
response items contain correct or 
acceptable responses, they are easily 
and objectively scored as correct or 
incorrect using an answer key. 
Machine-scorable answer sheets and 
hand-scoring templates simplify the 
scoring process, allowing teachers to 
quickly obtain results for 
timely feedback to students. 

Despite their advantages, 
assessments using selected- 
response formats have limi- 
tations. Instead of assessing 
the application of knowledge 
and higher-order skills in 
meaningful “real world” sit- 
uations, they tend to assess 
knowledge and skills in iso- 
lation and out of context. 
Selected-response items can- 
not adequately measure cer- 
tain learning outcomes, such 
as critical thinking, creativi- 
ty, oral communication, and 
social skills. While real- 
world issues and problems 
rarely have single correct 
answers, the widespread use 
of assessments with select- 
ed-response formats may 
communicate to students an 
unintended message about 
the nature of knowledge and 
learning — that recognizing 
the “right answer” is the pri- 
mary goal of education. 

Critics also express concern 
that multiple-choice tests 
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lead to “multiple-choice” teaching, that is, a focus on acquisition of facts rather than 
an emphasis on understanding and the thoughtful application of knowledge (Mitchell 
1992; Perrone 1991; Wiggins 1992). With a recognition of their advantages and limi- 
tations, teachers may appropriately incorporate selected-response formats as part of a 
balanced menu of assessment approaches. 

The development of fair and valid tests using selected-response items is a chal- 
lenging and time-consuming process. While a complete treatment of this topic is 
beyond the scope of this publication, several excellent resources are available to 
assist teachers in designing assessments using selected-response formats. For more 
information, see Carlson (1985), Haladyna (1994), and Nitko (1983). 
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Constructed-response format refers to those assessment activities that call upon 
students to construct a 
response, create a product, 
or perform a demonstration 
to show what they know 
and can do. In this book, 
constructed-response for- 
mats include brief con- 
structed responses, student 
products, student perfor- 
mances, and process- 
focused assessments (see 
Figure 2, second through 
fifth columns). Other writ- 
ers refer to such assess- 
ments as “performance 
assessments.” 
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Brief Constrodted 



Unlike selected-response 
items that call for a selection 
from given alternatives, 
brief constructed-response 
items ask students to gener- 
ate brief responses to open- 
ended questions, problems, 
or prompts. Short written 
answers and visual represen- 
tations (e.g., concept map, 
flow chart, graph) are exam- 





y V^pansM^ 
v;y;,tiie6ries,0y:6n%“;^is’'^^ 

: of vihate : to ^ 

^lyapcbirdmg-tq.-eac^^^^ 

fe :' the matnr.fts J'an^ tHpiri''\wifK ;' tK'S 




23 



BEST COP'^ AVAILABLE 



pies of widely used brief constructed-response methods. While brief constructed- 
response items may seek a correct or acceptable response (e.g., fill in the blank), they 
are more likely to yield a range of responses. Thus, the evaluation of student respons- 
es requires judgment, guided by criteria. This approach may be used for assessing 
declarative knowledge and procedural proficiency. In addition, assessments using brief 
constructed response items can provide insight into understanding and reasoning when 
students are requested to show their work and explain or defend their answers in 
writing. 

Assessments using brief constructed-response items offer several advantages. They 
require less time to administer than other types of assessments using constructed- 
response formats. Since they elicit short responses, several brief constructed-response 
items may be used to assess multiple content standards. Evaluation of student responses 
is straightforward, guided by criteria and model responses. 

Brief constructed-response items are limited in their ability to adequately assess atti- 
tudes, values, or habits of mind. In addition, they require judgment-based evaluation, 
which takes time and introduces potential problems of scoring reliability and fairness. 
Teachers are cautioned against regularly re-using brief constructed-response items for 
summative assessments so that students cannot give memorized responses to known 
questions and tasks. 

Performance-Based Assessment 

Performance-based assessments include student products, student performances, 
and process-focused assessments. Performance-based assessments require students 
to apply knowledge and skills rather than simply to recall and recognize. Thus, per- 
formance-based assessments 
are more likely to reveal stu- 
dent understanding. They are 
well suited to assessing appli- 
cation of content-specific 
knowledge, integration of 
knowledge across subject 
areas, and life-long learning 
competencies such as effective 
decision making, communica- 
tion, and cooperation (Shepard 
1989). 

The current interest in per- 
formance-based methods has 
popularized additional assess- 
ment terms, such as authentic assessment, rubric, anchors, and standards. The term 
authentic assessment, popularized by Grant Wiggins (Wiggins 1989), is used to 
describe performance-based assessments that engage students in applying knowl- 
edge and skills in ways that they are used in the “real world.” According to Wiggins, 
authentic assessments should also reflect good instructional practice in ways that 
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make “teaching to the test” legitimate and worthwhile. 

Performance-based assessments generally do not yield a single correct answer or 
solution but allow for a wide range of responses. Thus, evaluations of student 
responses, products, and performances must be based on judgments. The evaluative 
judgments are guided by criteria that define the desired elements of quality (Ferrara, 
Goldberg, and McTighe 1995). One widely used scoring tool is a rubric, a scoring 
tool used to evaluate the quality of constructed-response products and perfor- 
mances. Rubrics consist of a fixed measurement scale (e.g., four-point) and a list of 
criteria that describe the characteristics for each score point. Rubrics are frequent- 
ly accompanied by representative examples of student products or performances 
that illustrate each of the points on the scale. These examples are called anchors. 

The term standards is frequently used in conjunction with performance-based 
assessments. There are three distinct ways in which the term is used: (1) content 
standards, which specify what students should know and be able to do; (2) perfor- 
mance standards, which set expectations about how well students should perform; 
and (3) opportunity-to-learn standards, having to do with the necessary resources 
and conditions for effective teaching and learning. Performance-based assessments 
call for decisions about content standards as well as expected standards for perfor- 
mance (Diez 1993). Three primary types of performance-based assessments are 
products, performances, and process-focused assessments. 

Pirodmct. Student products provide tangible indicators of the application of knowl- 
edge and skills. Many educators believe that product assessment is especially “authen- 
tic” because it closely resembles real work outside of school. Teachers may evaluate 
written products (e.g., essays, research papers, laboratory reports), visual products 
(e.g., two- and three-dimensional models, displays, videotapes), aural products (e.g., 
an audiotape of an oral presentation), and other types of products to determine degrees 
of proficiency or levels of quality. 

Product assessment calls for the selection or development of criteria for evaluation. 
The criteria are incorporated into a scoring mbric, rating scale, or checklist. Many 
teachers recognize that evalua- 
tion criteria also serve an 
instructional purpose: providing 
students with a clear focus on 
elements of quality to guide 
their work. When the criteria are 
made public, students may be 
involved in using them in peer- 
and self-evaluation of products. 

One application of product 
assessment is systematically 
collecting representative sam- 
ples of student work over time 
in portfolios. Portfolios allow teachers, students, parents, and others to observe 
development and growth in learning. Portfolio assessment has been widely used 
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evaluation criteria also serve 
an instructional purpose: pro- 
viding students with a clear 
focus on elements of quality 
to guide their work. 
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over the years in the visual arts, architecture, and technical areas. 

In recent years teachers have increasingly used portfolios to document learning 
in other subject areas, especially the language arts. For additional information on the 
use of portfolios in the classroom, see Student Portfolios (National Education 
Association 1993). 

The use of products and portfolios can be appealing. When students are given 
opportunities to produce authentic products, they often become more engaged in, 
and committed to, their learning. Unlike standardized assessments that require uni- 
form student responses, performance-based assessments in which students create a 
product allow students to express their individuality. Product assessments also indi- 
cate what students can do, while revealing what they need to learn or improve. 
When teachers share the criteria used to evaluate products, students know the ele- 
ments of quality that will serve as a guide for peer- and self-evaluation. Previously 
developed products can serve an instructional purpose when they are presented as 
models of excellence for students (McTighe 1997; Wiggins 1992). 
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Despite their benefits, product assessments have their drawbacks. Criteria forjudg- 
ing the products must be identified, and product evaluation can be a time-consuming 
process. In addition, teachers must be careful when evaluating student products that 
their judgments are not unduly influenced by extraneous variables, such as neatness or 
spelling. Practicality must also be considered. The time required to develop quality 
products may compete with other instmctional priorities. Product assessments require 
resources, including funds for materials and space for display and storage. 

PecfoirmaDTice. Using performance assessments, teachers are able to observe 
directly the application of desired skills and knowledge. Performance assessments 
are among the most authentic types of student assessments because they can repli- 
cate the kinds of actual performances occurring in the world outside of school. 
Performances have been used widely to assess learning in certain disciplines, such 
as vocal and instrumental music, physical education, speech, and theater, where 
performance is the natural focus of instruction. However, teachers in other subjects 
can include performances, such as oral presentations, demonstrations, and debates, 
as part of an array of assessment methods. 

As with product assessments, teachers must develop criteria and scoring tools, 
such as rubrics, rating scales, or checklists, to evaluate student performances. 
Students gain additional instructional value when they apply the scoring tools for 
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peer- and self-evaluation. Such involvement helps students to internalize the ele- 
ments of quality embedded in the criteria. Many teachers have observed that stu- 
dents are motivated to put forth greater effort when they perform before “real” audi- 
ences of other students, staff, parents, or expert judges. Schools also benefit from 
positive public relations when students perform for the community. 

Many teachers have observed that students are motivated to put forth greater 
effort when they perform before “real” audiences of other students, staff, parents, or 
expert Judges. 

Despite their positive features, 
performance assessments can be 
time- and labor-intensive for stu- 
dents and teachers. Time must be 
allocated for rehearsal as well as for 
the actual performances. The evalu- 
ation of performances is particularly 
susceptible to evaluator biases, mak- 
ing fair, valid, and reliable assess- 
ment a challenge. 

IPirocess-IFocyseci AssessmerDU. 

Process-focused assessments pro- 
vide information on students’ learn- 
ing strategies and thinking process- 
es. Rather than focusing on tangible 
products or performances, this approach focuses on gaining insights into the under- 
lying cognitive processes used by students. A variety of process-focused assess- 
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by students. 
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ments are routinely used as a natural part of teaching. For example, teachers may 
elicit students’ thinking processes using oral questions such as: “How are these two 
things alike and different?” or by asking students to “think out loud” as they solve 
a problem or make a decision. Teachers may ask students to document their think- 
ing over time by keeping a learning log. Also, teachers can learn about students’ 
thinking processes by observing students as they function in the classroom. This 
kid watching method is especially well suited to assessing the development of 
attitudes or habits of mind, such as persistence. 

Process-focused assessments are formative in that they provide diagnostic infor- 
mation to teachers and feedback to students. They may also support the develop- 
ment of students metacognition by heightening their awareness of cognitive 
processes and worthwhile strategies. Process-focused assessment methods are typi- 
cally used over time, rather than on single occasions. Thus, they are rarely used in 
standardized, high stakes evaluations of students. 



I 
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EVALUATION METHODS AND ROLES 

In addition to making choices about classroom assessment methods, teachers 
should consider options for evaluating student work (see Figure 3). 

One question teachers must ask is: “How will we evaluate student knowledge 
and proficiency?” They should determine evaluation methods largely by the assess- 
ment approach and the nature of the student responses to the assessment item or 
task. Selected-response format items and some brief constructed response items 
(e.g., fill in the blank) yield a single correct or best answer. Most often teachers 
score such items using a key with the answers. Sometimes they ask students to 
“bubble in” their answers on an answer sheet that can be scanned by machine or 
hand-scored by overlaying a scoring template. Scoring of selected-response format 
items is relatively quick, easy, and objective. 

Assessments using constructed-response formats elicit a range of responses, 
products, or performances that reflect varying degrees of quality and different lev- 
els of proficiency. Because such assessments typically do not have a single correct 
answer, teachers must rely on judgment-based methods to evaluate responses to 
these open-ended assessments. Four primary types of evaluation methods are used 
with constructed-response formats: scoring rubrics, rating scales, checklists, and 
written and oral comments. 

A scoring rubric consists of evaluative criteria, a fixed scale (e.g., four or six 
points), and descriptive terms for discriminating among different degrees of under- 
standing, quality, or proficiency. The term rubric has its origins in the Latin word 
rubrica, meaning “red earth used to mark something of significance.” Today, educa- 
tors use rubric to communicate the important qualities in a product or performance. 

Scoring rubrics can be holistic (providing an overall impression of the elements 
of quality and levels of performance in a student’s work) or analytic (indicating the 
level of performance of a student’s work on two or more separate traits). For exam- 
ple, the reading rubric in Figure 4 presents an example of a holistic rubric for evalu- 
ating reading comprehension. The oral presentation rubric in Figure 5 shows an ana- 
lytic rubric for oral presentations. Notice that in the analytic rubric, four traits (con- 
tent, organization, delivery, language conventions) are evaluated independently. 
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* These evaluation methods may be used for some brief constructed response items (e.g., fill in the blank, short answer). 





Figure 4 




Reading Rubric 


IRatfrtiiiSelli 




4 


Reader displays a sophisticated understanding of the text with 
substantial evidence of constructing meaning. Multiple connections 
are made between the text and the reader’s ideas/experiences. 
Interpretations are sophisticated and directly supported by appro- 
priate text references. Reader explicitly takes a critical stance (e.g., 
analyzes the author’s style, questions the text, provides alternate 
interpretations, views the text from multiple perspectives). 


3 


Reader displays a solid understanding of the text with clear evi- 
dence of constructing meaning. Connections are made between 
the text and the reader’s ideas/experiences. Interpretations are 
made and generally supported by appropriate text references. 
Reader may reveal a critical stance toward the text. 


2 


Reader displays a partial understanding of the text with some evi- 
dence of constructing meaning. A connection may be made 
between the text and the reader’s ideas/experiences, but it is not 
developed. Interpretations are not made and/or not supported by 
appropriate text references. Reader shows no evidence of a criti- 
cal stance toward the text. 


1 


Reader displays a superficial understanding of the text with limited 
evidence of constructing meaning. No connections are made 
between the text and the reader’s ideas/experiences. Reader pro- 
vides no interpretations or evidence of a critical stance. 


0 


Reader displays no evidence of text comprehension or construct- 
ing meaning. 



Holistic rubrics are most appropriately used for summative purposes (such as the 
evaluation provided at the conclusion of unit or a course) where the goal is to pro- 
vide an overall picture of student performance. Most report card grades represent 
holistic evaluation, since a variety of “subscores” (tests, quizzes, performance tasks, 
homework, classwork, etc.) are collapsed into a single symbol— the letter grade— 
for each subject. 
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Figure 5 

Oral Presentation Rubric 




Language . 

Content Organization Delivery Conventions 


4 


(Varies by 
assign- 
ment) 


Coherent organi- 
zation through- 
out; logical 
sequence; smooth 
transitions; effec- 
tive introduction 
and conclusion 


Excellent volume; 
fluent delivery 
with varied into- 
nation; effective 
body language 
and eye contact 


Highly effective 
use of language 
enhances the 
message; few, if 
any, grammatical 
mistakes 


3 


(Varies by 
assign- 
ment) 


Good organiza- 
tion generally but 
with some break 
in the logical flow 
of ideas; clear 
transitions; identi- 
fiable introduc- 
tion and conclu- 
sion 


Adequate volume 
and intonation; 
generally fluent; 
generally effective 
body language 
and eye contact 


Generally effec- 
tive use of lan- 
guage supports 
the message; 
minor grammati- 
cal errors do not 
interfere with 
message 


2 


(Varies by 
assign- 
ment) 


Flawed organiza- 
tion; ideas not 
developed; weak 
transitions; inef- 
fective conclusion 


Volume is too 
low or too loud; 
delivery is not 
fluent; body lan- 
guage and eye 
contact do not 
enhance message 


Use of language 
not always aligned 
with the message; 
grammatical 
errors may inter- 
fere with message 


1 


(Varies by 
assign- 
ment) 


Lack of organiza- 
tion; flow of ideas 
difficult to follow; 
no evidence of 
transitions; no 
introduction or 
conclusion 


Message cannot 
be understood 
due to low vol- 
ume; strained 
delivery; ineffec- 
tive body lan- 
guage; lack of eye 
contact 


Major grammati- 
cal errors make 
the message very 
difficult or impos- 
sible to follow 



Holistic rubrics have their place, but teachers should employ primarily analytic 
rubrics for day-to-day evaluation in their classrooms. Since they identify and eval- 
uate particular traits, analytic rubrics provide more detailed and specific feedback to 
students about the strengths of their performance and the areas needing attention. If 
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Figure 6 

Art Rubric 






3 


Identifies three or more relevant differences between the work of 
Matisse and van Allsburg (e,g,, use of color, level of detail/simplifica- 
tion, use of line and shape, materials, process) 

Identifies a preference for one artist’s style 

Supports preference with two or more well-stated reasons citing 

specific examples from the artist’s work 

Uses a variety of art vocabulary terms appropriately 


2 


Identifies two relevant differences between the work of Matisse 
and van Allsburg 

Identifies a preference for one artist’s style 

Supports preference with one reason citing an example from the 
artist’s work 

Uses one or two art vocabulary terms appropriately 


1 


Does not clearly identify significant differences between the work 
of Matisse and van Allsburg 

Identifies a preference for one artist’s style, but does not support 
preference with reasons or examples 

Does not use art vocabulary terms appropriately 




the goal is to improve student learning, not simply grade it, then such specific feed- 
back is needed. How can students improve their research skills, for instance, if all 
they receive is a “3” on a holistic rubric (or a “B-” on a research report)? Such eval- 
uations provide little meaningful guidance about how to do a better job in the future. 
An analytic rubric, on the other hand, offers greater specificity. For example, a stu- 
dent receiving the following descriptive comments on an analytic rubric — “uses 
several appropriate sources to gather information on the topic” and “needs to docu- 
ment all sources using standard bibliographic notation” — is informed about a 
strength of the research (use of multiple sources) and a weakness (lack of complete 
documentation). The intent of such feedback is to encourage the student to become 
more attentive to the importance of careful source documentation on future research 
projects. 

Rather than choosing between these two types of rubrics, teachers can use both 
during a course or unit of study. They can use the analytic rubric(s) “along the way” 
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to inform teaching and guide student practice and revision, and they can use the 
holistic rubric(s) at the conclusion of a performance task or unit assessment to pro- 
vide an overall evaluation of student knowledge and proficiency. 

In addition to being analytic or holistic, rubrics also may be generic or task-spe- 
cific. A generic rubric provides general criteria for evaluating a student’s perfor- 
mance in a given performance area. The rubrics shown in Figures 4 and 5 are gener- 
ic rubrics since they may be used to evaluate a variety of responses to reading and 
oral presentations, respectively. In contrast, a task-specific rubric is designed for use 
with a particular assessment task. For example, the art rubric in Figure 6 presents a 
rubric used to assess the task of comparing the styles and techniques of two artists 
(Matisse and van Allsburg) and indicating a preference. Notice that a task-specific 
rubric, such as this one, cannot be used to evaluate responses to different perfor- 
mance tasks. 

Generic rubrics offer the capability of multiple applications within a given area, 
such as mathematical problem solving, persuasive writing, and research. Rather 
than creating a new rubric for each and every performance task, the same rubric can 
be taught to students, posted in the room, and used throughout the year (and often 
across grade levels). With repeated use, the criteria contained in the generic rubric 
can be internalized by students so that they are better able to consider the qualities 
of effective performance while they are working, as well as to evaluate their own 
work when they are finished. 

There are times, however, when a task-specific rubric will be preferable. For 
instance, task-specific rubrics tend to yield greater reliability (consistency) when 
used by different teachers. Thus, a department or grade-level team might employ a 
task-specific rubric for use with a common performance task or final exam given by 
more than one teacher. Task-specific rubrics can be customized from generic 
rubrics. 

Rubrics are most effectively used for evaluation or instruction when they are 
accompanied by examples of responses for each score point. These examples or 
anchors provide tangible illustrations of the various points on the rating scale. 
Perhaps the greatest advantage of rubrics is their clear delineation of the elements 
of quality. They provide students with clear performance targets, expectations about 
what is most important, and criteria for evaluating and improving their own work. 
They provide teachers with specific criteria for reliably evaluating student respons- 
es, products, or performances; a “tool” for increasing the consistency of evaluation 
among teachers; and clear targets for instruction. 

These evaluation methods require time to collect or develop rubrics, to iden- 
tify representative anchors, to develop proficiency in applying them reliably, and 
to use them for evaluating student responses, products, and performances. 
Nonetheless, some schools and districts have recognized the significant profes- 
sional development benefits of providing opportunities for teachers to work 
together on scoring student responses, products, and performances and identify- 
ing anchors. 

Rating scales may also be used to evaluate responses to open-ended questions 



Some schools and districts 
have recognized the 
significant professional 
development benefits of 
providing opportunities for 
teachers to work together 
on scoring student 
responses, products, 
and performances and 
Identifying anchors* 



and tasks. Bipolar rating scales (see 
Figure 7), for example, are widely 
used on questionnaires and can be 
applied to educational assessments 
as well. Such a scale might be used 
in conjunction with evaluations 
related to program selection (e.g., 
special education placement) or for 
peer evaluation of a product or per- 
formance (e.g., “This oral presenta- 
tion achieves its stated purpose.”). 

Checklists contain categories 
(i.e., specific features or dimensions) 
for evaluation and rating options for 
each category. The rating options 
may offer a simple “yes” or “no” to 
indicate the presence or absence of 
each dimension, or a narrow scale, 

such as “never,” “rarely,” or “frequently.” Checklists are easy-to-use, efficient eval- 
uation tools. They can be used while teaching a lesson or leading a discussion and 
are especially useful when observing students at work. Checklists may also be used 
as guides by students, individually or in groups, while they engage in performance 
activities. 

While rating scales and checklists are simple to apply in the classroom, they gen- 
erally do not provide the detailed, explicit criteria found in rubrics. Thus, they are 
open to differing interpretations and greater subjectivity when used to evaluate stu- 
dent products and performances. 

Written and oral comments can be effective in evaluating student work because 
they enable teachers to communicate clearly and directly with their students about 
elements of quality, expected standards of performance, areas of strengths, and 
needed improvements. These methods allow teachers to provide evaluative feedback 
to students on a personal level. Written and oral comments can require a great deal 



Figure 7 

Sample Bipolar Rating Scale 
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of teacher time and are especially demanding for secondary teachers working with 
one hundred or more students per day. The effectiveness of personal comments may 
be diminished if teachers provide only negative feedback (identifying errors or 
problems), make nonspecific positive comments that do not acknowledge particular 
aspects of student effort and work, or make comments that do not address all impor- 
tant elements of quality. 

Teachers must also ask: “Who will be involved in the evaluation?” As always, 
this guiding question should be answered with content standards, purposes, audi- 
ences, and methods in mind. The question also brings to mind the opportunity to 
involve others in the evaluation process. Teachers may involve other staff members, 
parents, or community experts in the evaluation of student products (e.g., science 
fair projects) and performances (e.g., public-speaking exhibitions). They may also 
involve students. When students are engaged in applying criteria for self- and peer- 
evaluation, they begin to internalize elements of quality and performance standards 
in ways that can lead to improvements in the quality of their work and learning. 




VI. 

COMMUNICATION AND 
FEEDBACK METHODS 



After evaluations are made, teachers must ask: “How will we communicate 
assessment results?” A variety of methods can be used, including numerical scores, 
letter grades, verbal and written reports, scales, and checklists (see Figure 3). The 
choice of communication methods should be determined by assessment purposes 
and methods, evaluation methods, 

The choice of communi- 
cation methods should 
be determined by assess- 
ment purposes and 
methods, evaluation 
methods, and especially 
the audience for the 
assessment information. 

are meant to reflect. For example, 

saying that 70 percent correct is a “C” can mean one thing on an easy task and some- 
thing different on a difficult task and it does not make clear what a student knows 
and can do. Likewise, when students are graded “on a curve,” their knowledge or 
performance level is communicated in relation to other students in the class, not in 
terms of established criteria and standards. 

Developmental and proficiency scales are generally more informative than 
numerical scores and grades because they contain descriptions of different degrees 
of quality and levels of performance (see Figure 8 for an example of a develop- 
mental scale for reading). Information about student learning presented in terms of 



and especially the audience for the 
assessment information. 

Numerical scores (e.g., percent- 
age correct or number of points 
earned on a classroom quiz) and let- 
ter grades are widely used methods 
for communicating the results of 
classroom assessments. Both meth- 
ods are efficient to use and succinct, 
but numerical scores and grades, by 
themselves, do not explicitly com- 
municate the elements of quality and 
standards of performance that they 
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developmental or proficiency levels can be especially meaningful to parents. 
Recognizing this fact, some schools and districts have revised their report cards, 
especially for the primary grades, to incorporate features of developmental and pro- 
ficiency scales. 

Checklists can also be effective for communicating assessment results because 
they present ratings on identified criteria or elements of quality. They are a quick 
and efficient method for providing direct and timely feedback to students. However, 
checklist developers must be careful to avoid poorly defined categories, such as cre- 
ativity, that are open to diverse interpretations. 

Figure 8 

Developmental Reading Scale 

Emergent Reader 

□ follows along in the text when adult reads 

CH is aware of relationship of printed text to oral language 

□ uses picture cues when recalling story 

□ pretends to read; memorizes favorite stories 

Beginner Reader 

□ reads word-for-word; struggles with unfamiliar material 

□ has limited sight vocabulary of one- and two-syllable words 

□ attempts to pronounce and figure out meaning of new words 

□ demonstrates comprehension of simple text 

□ occasionally monitors comprehension and self-corrects 

Competent Reader 

□ reads familiar material comfortably 

□ has large sight vocabulary 

n uses context clues to figure out meaning of unfamiliar words 

□ actively constructs meaning 

□ regularly monitors comprehension and self-corrects 

Fluent Reader 

□ reads fluently with expression 

□ has extensive sight vocabulary 

□ readily determines meaning of unfamiliar words using context clues 

□ reads a wide variety of materials with understanding 

□ independently monitors comprehension; appropriately 
applies comprehension strategies 
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Written comments, narrative reports, verbal reports, and conferences can be 
effective communication methods because they provide opportunities to clearly and 
directly connect student effort and performance to elements of quality and standards 
of performance. They also allow teachers to provide more individualized and per- 
sonal feedback than the other communication methods. Regrettably, the time-con- 
suming nature of these methods often limits their use, especially for teachers at the 
secondary level because of the greater student-to-teacher ratio. 




AFTERWORD 

Assessment is an essential component of the teaching and learning process. 
Without effective classroom assessment, it is impossible for teachers to know 
whether students are hitting the target” — that is, learning what is important for 
them to learn. However, the significance of classroom assessment extends beyond 
the role of measuring learning. What we assess, how we assess and evaluate, and 
how we communicate results send a clear message to students about what is worth 
learning, how it should be learned, what elements of quality are most important, and 
how well we expect them to perform. By considering the key questions and princi- 
ples presented here, teachers will be better equipped to develop and use classroom 
assessments that provide fair, valid, and reliable information that will inform teach- 
ing and promote learning. 





GLOSSARY 



analytic scoring — scoring procedure in which responses, products, or perfor- 
mances are evaluated for selected dimensions, with each dimension receiving a sep- 
arate score. For example, a piece of writing may be evaluated on several categories, 
such as organization, use of details, attention to audience, and language usage and 
mechanics. Analytic scores may be weighted and totaled. 

anchor(s) — representative responses, products, or performances used to illustrate 
each point on a scoring scale. They are also referred to as “models” and “range-find- 
er papers.” Anchors for the highest score point are sometimes referred to as exem- 
plars. 

assessment — any systematic basis for making inferences about characteristics of 
people, usually based on various sources of evidence; the global process of synthe- 
sizing information about individuals in order to understand and describe them bet- 
ter (Brown 1983). 

authentic assessment — refers to assessment tasks that evoke demonstrations of 
knowledge and skills in ways that they are applied in the “real world.” Ideally, 
authentic assessment tasks also engage students and reflect best instructional activ- 
ities. Thus, teaching to the task may be desirable. 

content standard — a goal statement specifying desired knowledge, skills or 
processes, and attitudes to be developed as a result of educational experiences. 

criteria — guidelines, rules, or principles by which student responses, products, or 
performances are evaluated. 

criterion referenced — an approach for describing a student’s performance on an 
assessment according to established criteria. 

evaluation — judgment regarding the quality, value, or worth of a response, product, 
or performance based upon established criteria. 

formative assessment — ongoing, diagnostic assessment providing information 
(feedback) to guide instruction and improve student performance. 

generalizability — the extent to which responses, products, or performances sam- 
pled by a set of assessment activities are representative of the broader domain being 
assessed. 

holistic scoring — a scoring procedure yielding a single score based upon an over- 
all impression of a response, product, or performance. 

indicator — a specific description of an outcome in terms of observable and assess- 
able behaviors. An indicator specifies what a person who possesses the qualities 






articulated in a content standard knows or can do. Generally, several indicators are 
needed to adequately describe each content standard. 

interdisciplinary or integrated assessment — assessment that uses tasks that test 
students abilities to apply concepts, principles, skills, and processes from two or 
more subject disciplines to a central question, theme, issue, or problem. 

norm referenced— an approach for describing a student’s performance on an 
assessment by comparison to a norm group. 

opportunity-to-learn standards — the conditions and resources necessary for 
teachers and schools to meet higher standards for students. 

performance-based assessment (or performance assessment) — an assessment 
activity that requires students to construct a response, create a product, or perform 
a demonstration. Performance-based assessments generally do not yield a single 
correct answer or solution but allow for a wider range of responses. Thus, evalua- 
tions of student responses, products, and performances are based on judgments 
guided by criteria. 

performance standard — an established level of achievement, quality, or proficien- 
cy. Performance standards set expectations about how much students should know 
and how well students should perform. 

performance task— an assessment activity, or set of activities, related to one or 
more content standards, that elicits one or more responses to a question or problem. 

portfolio — a purposeful, integrated collection of student work showing effort, 
progress, or achievement in one or more areas (adapted from Paulson, Paulson, and 
Meyer 1991). Since they feature works selected over time, portfolios are well suit- 
ed to assess student growth and development. 

primary trait(s) scoring — a scoring procedure in which responses, products, or 
performances are evaluated by limiting attention to a single criterion. These indi- 
vidual criteria are based upon the trait determined to be essential for a successful 
performance on a given task. For example, persuasiveness might be the primary trait 
being evaluated in a note to a principal urging a change in a school mle. Scorers 
would attend only to that trait. 

proficiency- having or demonstrating a high degree of knowledge or skill in a par- 
ticular area. 

reliability the degree to which an assessment yields dependable and consistent 
results. 

rubric a scoring tool used to evaluate a student’s performance in a content area. 
Rubrics consist of a fixed measurement scale (e.g., a four-point scale) and a list of 
criteria that describe the characteristics of products or performances for each score 
point. Rubrics are frequently accompanied by examples (anchors) of student 
responses, products, or performances to illustrate each of the points on the scale. 

standardized assessment — an assessment that uses a set of consistent procedures 
for constmcting, administering, and scoring. The goal of standardization is to ensure 



that all students are assessed under uniform conditions so that interpretation of their 
performance is comparable and not influenced by differing conditions (Brown 
1983 ). 

summative assessment — culminating assessment for a unit, grade level, or course 
of study providing a status report on mastery or degree of proficiency according to 
identified content standards. 

test — a set of questions or situations designed to elicit responses that permit an 
inference about what a student knows or can do. Tests generally utilize a paper-and- 
pencil format, occur within established time limits, restrict access to resources (e.g., 
reference materials), and yield a limited range of acceptable responses. 

validity — ^refers to the degree to which an assessment measures what it is intended 
to measure. 
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